Method and device for error recovery

ABSTRACT

An Error Recover Procedure (ERP) is disclosed which, during execution of the ERP, while self-diagnosing the cause of error, dynamically changes the error recovery steps of the ERP. An appropriate ERP is selected and executed according to the detected error status.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention is related to a storage device having an errorrecovery function. In particular, it is related to a disk storage devicehaving an error recovery procedure (ERP) comprising a self-diagnosticfeature.

2. Description of the Related Art

In disk storage devices, when reading data from a disk storage medium,defects of the disk and track mis-registration, etc. may result in readerrors. When these errors occur, for a high readability, the errorrecovery procedure (ERP) comprising the error recovery steps, such asretry and change of parameters, is executed to repair the errors.

Usually, for the errors in the data area, the ECC (error correctioncode) which is a general error recovery code is used to execute theerror recovery process. Furthermore, a variety of error recovery steps,such as change of reading gains, change of offtracks, and in the casewhere a magneto-resistive (MR) element is used as a reading head, changeof bias values of the MR element, are executed. Once these errorrecovery steps are executed, the data is read again. If the rereadinghas succeeded, the data continues to be used. If the recovery by theerror recovery procedure fails it results in hard error, or if the datacan be re-recorded (reassigned) to another area, the area concerned onthe disk is regarded as an unavailable area, and the data is reassignedto the other area.

The recent disk device utilizes a magneto-resistive head or giant MR(GMR) head. It reads data by utilizing the property of the MR elementthat its resistivity is changed due to the change of the magnetic field.However, one of the reading errors which occur in this method forreading the change of resistance is a Thermal Asperity (TA). The ThermalAsperity refers to a projection which is generated on the disk collidingagainst the reading head to cause the change of resistance due to thechange of temperature to occur in the MR element. Thereby, an abnormalsignal is generated.

As a countermeasure to the error for this Thermal Asperity, there is amethod of changing the circuit by constant filtering of the outputsignal of the head (making the frequency response faster) to relativelyshorten the TA waveform so as to be enabled to read. This is alsoconfigured as a portion of the above error recovery procedure (ERP).

For the errors in reading and writing data, there are a variety ofcountermeasures as mentioned above. They are usually stored as a seriesof steps of the ERP. Once the ERP is started, these steps are executedsequentially.

As mentioned above, there are a variety of factors of error generation.Therefore, the ERP which is effective for these various error factors isrequired. The ERP typically executes rereading by changing and adjustingone-by-one the standard reading conditions defined among the disk,magnetic head, and HDC (hard disk controller). Here, the readingconditions are, for example, an amount of the offtrack which is theamount of discrepancy between the center of the magnetic head and thecenter of the track, a value of the bias current supplied to the MRelement in the case where the MR element is provided as the magnetichead, an adjustment of the automatic gain control (AGC) which isprovided for the constant amplitude of the regenerated signal, and anadjustment of the speed of the PLL circuit for the stabilized samplingfrequency, etc.

Usually, a plurality of error recovery steps are registered with theERP. These steps are executed in a predetermined order. Each time eachstep is finished, retry (rereading) is executed. The ERP is finishedwhen the retry has succeeded. If the retry has not succeeded, the ERP isfinished when the preset maximum number of times of retry is reached, orthe final step of the ERP is finished.

The ERP also comprises steps requiring considerable time to be executed.Executing all of these error recovery steps may take time of the orderof from over ten seconds to several tens of seconds.

Conventionally, the time required to time out for a read instructionfrom the host system is typically about 30 seconds, but some recentsystems time out earlier than that. Thus, some systems may be unable toexecute all the steps of the ERP.

As a method for solving such a problem, in Published Unexamined PatentApplication No. 10-134528, the applicant has suggested a method ofchanging the order of execution of each error recovery step based on thehistory of past errors. And, in Japanese Patent Application No.8-307743, the applicant has suggested a method of continuing to executethe ERP until the final step even if the time out occurs.

However, the recent ERP comprises error recovery steps such as theinitialization of the GMR element, which are effective only forparticular errors, but if being executed too frequently, have a risk offacilitating the degradation of the head, etc. Such steps wereregistered as later steps in the ERP so as to reduce the frequency ofthe execution.

And, in the conventional ERPs, there was not an ERP which, duringexecution of the ERP, while self-diagnosing the cause of error, changesdynamically the error recovery steps.

On the other hand, in the case where errors occur when writing data to asector, rewriting is executed after the execution of the ERP, and ifwriting is still not possible, the concerned data to be written will bereassigned to another sector. In this case, there are mainly two causessince the conventional ERPs are configured without reference to theprecision of the Track Following: (1) Write Abort due to the TA anddefects of the disk existing in a particular servo sector; and (2) WriteAbort due to the degradation of the positioning of the head due to theRRO (Repeatable Run Out) component of the spindle motor.

In the case of (1), the probability that rewriting succeeds by theexecution of the ERP is relatively low. It is more efficient to reassignimmediately than to spend time to execute the ERP.

In the case of (2), differing from the case of (1), there is no defectwhich makes it physically impossible to write. The possibility that thewriting succeeds by execution of the ERP and rewriting is relativelyhigh. Therefore, the ERP should be executed until a certain step or thefinal step.

In the prior art, it is not possible to determine when either of the twocases as mentioned above is the cause of error. Therefore, the ERP wasexecuted indiscriminately even if immediate reassigning should beexecuted as in the case of (1).

SUMMARY OF THE INVENTION

An object of the present invention is to provide an ERP which, duringexecution of the ERP, while self-diagnosing the cause of error, changesdynamically the error recovery steps.

Another object of the present invention is to provide an ERP which canrecover from the errors adequately and in a short time (with a fewsteps) according to the cause of error.

Furthermore, another object of the present invention is to provide anERP in which the execution of the error recovery steps unsuitable forthe cause of error is eliminated, and the degradation of the head, etc.caused by that can be prevented.

Furthermore, another object of the present invention is to provide ameans of determining the cause of error.

Furthermore, another object of the present invention is to provide ameans of selecting whether or not to execute the ERP according to thecause of error.

Furthermore, another object of the present invention is to make theresponse of the disk storage device to the host system faster byreassigning without execution of the ERP, in the case of an error due toa particular cause.

According to the present invention, the error status is detected, andaccording to the detected error status, an appropriate ERP is selectedand executed.

And, in another embodiment of the present invention, the precision ofthe Positioning is measured during the Track Following, and when themeasured value is equal to or more than a certain value, it isdetermined that the RRO component resulted from the spindle, etc. islarge, and the ERP continues to be executed. In the case where it isdetermined as a result of the measurement that the TA or the defect ofthe disk exists in a particular servo sector, the countermeasure to TAis taken, and if recovery is still not attained, the ERP is terminated,and reassigning is conducted.

Furthermore, the method according to the present invention for executingthe ERP comprising a plurality of error recovery steps in the storagedevice comprises the steps of detecting the error status, selecting theerror recovery steps in response to the detected error status, andexecuting the selected error recovery steps. The device according to thepresent invention for executing the ERP comprising a plurality of errorrecovery steps comprises the means of detecting the error status,selecting the error recovery steps in response to the detected errorstatus, and executing the selected error recovery steps.

And furthermore, the method according to the present invention forexecuting the ERP comprising a plurality of error recovery steps in thestorage device comprises the steps of measuring the servo stability,selecting the error recovery steps in response to the measured servostability, and executing the selected error recovery steps. The deviceaccording to the present invention for executing the ERP comprising aplurality of error recovery steps comprises the means of measuring theservo stability, selecting the error recovery steps in response to themeasured servo stability, and executing the selected error recoverysteps.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a hard disk device (HDD) to which thepresent invention is applied;

FIG. 2 (being comprised of FIG. 2A and FIG. 2B) shows the processaccording to the present invention for executing the ERP;

FIG. 3 shows the process for measuring the servo stability;

FIG. 4 shows an example of the relationship between ape_off and sigma;

FIG. 5 shows an example of countermeasures to servo stability; and

FIG. 6 shows the process for determining the optimum reference value.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

FIG. 1 is a block diagram showing a hard disk device (HDD) to which thepresent invention is applied. The disk device 100 is configured by acontroller portion 110 and a disk portion 130. The controller portion110 comprises a host interface controller (HIC) 112 connected to a hostsystem 10, a hard disk controller (HDC) 114 for controlling the diskportion, connected to the host interface controller 112, a channel 116for controlling read and write signals, connected to the hard diskcontroller 114, MPU 118 connected to HIC 112, HDC 114, and channel 116to control them, and RAM 120 connected to the MPU 118 to storemicrocodes executed by the MPU. The disk portion 130 is provided with amotor 134 for rotating a spindle 132. Disks 136A and 136B are attachedto the spindle 132 such that they rotate integrally with the spindle132. Although two disks are shown in the figure, the number of disks maybe one or not less than three.

Heads 138A, 138B, 138C, and 138D supported respectively by actuator arms140A, 140B, 140C, and 140D are placed so that each of them is opposed toa surface of the disk. The actuator arms 140A to 140D are attached to avoice coil motor (VCM) 144 via a pivot shaft 142, and, by its rotarymotion, the head 138A to 138D are moved to desired radial positions inthe disks. The motor 134 and the VCM 144 are connected to the HDC 114 tohave their numbers of revolution and speeds, etc. controlled. The head138A to 138D are connected to the channel 116 to have read and writesignals controlled by the channel 116.

In FIG. 2 (being comprised of FIG. 2A and FIG. 2B), a process accordingto the present invention for executing the ERP is shown. When a read orwrite instruction is provided to the disk device from the host, in step200, a read or write operation is initiated in the disk device. In step202, an SER (soft error rate) is measured, and the result of themeasurement is logged. In step 204, it is checked whether any error hasoccurred or not, and if no error has occurred, the process is finishedin step 206. If an error has occurred in step 204, the disk devicereceives an error status from the HDC in step 210.

In the following steps 220, 230, 240, and 250, an ERP is selectedaccording to the error status received in step 210. In step 220, if theerror status is an error due to the TA such as a TA bit, the error isdetermined to be one due to the TA, and a countermeasure to TA is takenin step 222. As a countermeasure to TA, one or more error recoverysteps, such as high rotational speed reading or the like are adopted.After the error recovery steps according to the countermeasure to TA arefinished, a retry of reading or writing is executed in step 224. If theretry succeeds in step 226, the process is finished in step 228, and ifthe retry fails, a “TA error” is returned back to the host in step 229.

In the case where the error status is not the TA bit in step 220, if, instep 230, the error status is a write error such as a write abort, theerror is determined to be one due to the write abort, and a retry ofwriting is executed in step 232. If the retry succeeds in step 234, theprocess is finished in step 236. If the retry fails, a “write harderror” is returned back to the host in step 238.

In the case where the error status is not the write abort in step 230,if, in step 240, the error status is an external impact error, the erroris determined to be one due to the external impact, and a retry ofreading or writing is executed in step 242. If the retry succeeds instep 244, the process is finished in step 246. If the retry fails, an“external impact error” is returned back to the host in step 248. Now,the external impact herein represents that when an impact is sensed by,for example, an impact sensor, an error signal is generated, and theerror status is regarded as the external impact.

In the case where the error status is not the external impact in step240, if, in step 250, the error status is a head output error such as atoo little head output, the error is determined to be one due to the toolittle head output, a countermeasure to too little head output is takenin step 252. As a countermeasure to too little head output, one or moreerror recovery steps, such as low rotational speed reading andinitialization of the GMR element, are adopted. After the error recoverysteps according to the countermeasure to too little head output arefinished, a retry of reading or writing is executed in step 254. If theretry succeeds in step 256, the process is finished in step 258. If theretry fails, a “head output error” is returned back to the host in step259.

If, in steps 220 to 250, the error is determined not to be any of theseas mentioned above, a normal ERP is executed in step 260. After thenormal ERP is executed, a retry of reading or writing is executed instep 262. If the retry succeeds in step 264, the process is finished instep 266. If the retry fails, it is determined whether or not the erroris due to a further cause of error in step 268. A servo stability ismeasured in step 268, and it is checked whether or not the cause oferror is due to an instability of the positioning of the track. In step270, it is checked whether or not the error is an SER error. If it isthe SER error, the servo stability is determined to be insufficient, anda countermeasure to servo stability is taken in step 272. In the casewhere the error is not the SER error in step 270, or after thecountermeasure to servo stability is taken in step 272, a retry ofreading or writing is executed in step 274. If the retry succeeds instep 276, the process is finished in step 278. If the retry fails, theprocess is returned to step 268 again, and the servo stability ismeasured. Here, the error may be returned to the host to finish theprocess after a certain number of times of trial or a certain elapsedtime so that the loop returning from step 276 to step 268 is not aninfinite loop.

As described above, the error recovery steps in accordance with theself-diagnostic of the cause of error and the cause according to thepresent invention may be arranged before or after the normal ERPaccording to the prior art, or may be arranged to replace the normalERP. Although an example of the error status received in steps 220 to259 and the error recovery steps according to them is shown in thisembodiment, other error status or error recovery steps may be used.

Although the case of the operation of reading or writing is explained,the present invention may also be applied to the case of a Seekoperation. For example, in the case of a Settling error, the property ofthe filter is changed. In the case of the servo stability error,depending on whether a particular frequency component is put on it, orthe stability is degraded over the whole frequency range, thecorresponding frequency component is filtered, or the ERP correspondingto the degradation of the head property is executed.

Now, the measurement of the servo stability will be explained in moredetail.

In FIG. 3, the measurement of the servo stability is initiated in step300. In step 302, a value of a variable “Intg” representing the integralof the absolute value of the positioning error of the servo isinitialized to a value of a constant Ini_intg. In step 304, an “APE”representing the absolute value of the positioning error is comparedwith the Intg. In the case where the head is deviated from the center ofthe track in one direction (for example, inward of the disk), the valueof the Intg is greater than the APE, so the process advances to step308, and a certain amount, delta, is subtracted from the value of theIntg. In the case where the head is deviated from the center of thetrack in the other direction (for example, outward of the disk), or thehead is not deviated from it, the value of the Intg is smaller than orequal to the APE, a certain amount, delta, is added to the value of theIntg in step 306.

After the step 306 or 308 is finished, the process advances to step 310.The absolute value of a difference between the value of the nth integralIntg(n) and the value of the (n−1)th integral Intg(n−1) is compared witha reference value. If the absolute value of the concerned difference issmaller than the reference value, the value of integral Intg isdetermined to have converged, and the Intg(n) is set as the positioningvalue in step 312. The process is then finished in step 314. If theabsolute value of the concerned difference is greater than or equal tothe reference value, the process is returned to step 302. The steps 302to 310 are then repeated until it converges. In this way, a valuecorresponding to the positioning value is found.

In FIG. 4, an example of the relationship, which is obtained by theinventors' experiment, between the value ape_off obtained by theabove-mentioned value of integral of the positioning error Intg dividedby a gain and the error distribution sigma is shown. It is confirmed bythis figure that there is a positive correlation between the ape_offcalculated from the above positioning values and the error distribution.

In FIG. 5, an example of the countermeasure to servo stability employingthe above-mentioned correlation is shown. The process is initiated instep 500, and in step 502, the ape_off is calculated using the method asdescribed in FIG. 3. As mentioned above, the ape_off converges to thevalue corresponding to the error distribution. In step 504, the processwaits for the ape_off to converge. For example, in a writing ERPcomprising a loop comprising a series of ERP steps, since this loop isexecuted one time, until the disk rotates a certain number of times (forexample ten times), or a certain time elapses, the process waits for theape_off to converge.

Next, in step 506, the value of the ape_off is preserved in an errordistribution table. It is desirable to preserve this value for eachhead, and for each zone. In step 508, the ape_off is compared with apredetermined reference value of an error distribution, and if theape_off is greater than it, it is determined to be the errordistribution also comprising the RRO, the number of times of the loop ismaximized in step 510, and the process is finished in step 514. If, instep 508, the ape_off is smaller than or equal to it, the cause of erroris determined not to be the error distribution, and the dynamic ERP asshown in FIG. 2 is applied in step 512, and in the case of still notrecovering from the error, after reassigning, the process is finished instep 514. In the latter case, the number of times of the ERP loop isminimized.

In FIG. 6, the process for determining the optimum reference value inFIG. 5 is shown. In this figure, the axis of ordinates represents thenumber of ERP steps executed until the recovery from the error isattained, and the axis of abscissas represents the value of the ape_off.A group A has small values of the ape_off, and requires a relativelylarge number of the ERP steps until the recovery from the errors isattained. A group B has large values of the ape_off, and recovers fromthe errors with a relatively small number of the ERP steps. A group Chas large values of the ape_off and reaches the maximum number of theERP steps, so therefore the recovery from the errors is not attained andreassigning is executed.

In further analysis, the group A may be considered to have small RROssince their values of the ape_off are small, so it is considered thatthey are errors due to the other factors such as the TA, rather than dueto the error distribution, and therefore the possibility that therecovery from the errors can be attained by executing the same ERP loopfurther again and again is low. On the other hand, the groups B and C,since they have large values of the ape_off, may be considered to be theerrors due to the error distribution, which are greatly affected by theRRO, and therefore the possibility that the recovery from the errors canbe attained by executing the same ERP loop further several times ishigh. Here, although the group C results in being reassigned, it isdifficult to distinguish between it and the group B only by the value ofthe ape_off.

Therefore, as shown in FIG. 6, by defining the reference value as thevalue of the ape_off which divides between the group A and the groups Band C, it is possible to take a countermeasure to other errors orexecute reassigning early without executing the same ERP loop for thegroup A further again and again, and it is expected that its performanceis improved. Furthermore, with respect to the group B, the recovery fromthe errors can be attained by executing the same ERP loop furtherseveral times.

INDUSTRIAL APPLICABILITY:

According to the present invention, it is possible, whileself-diagnosing the cause of error, to change dynamically the errorrecovery steps during execution of the ERP.

Also according to the present invention, it is possible to recover fromthe errors adequately and in a short time (with a few steps) accordingto the cause of error.

Furthermore according to the present invention, it is possible toeliminate the execution of the error recovery steps unsuitable for thecause of error, and prevent the degradation of the head, etc. caused bythat.

Furthermore according to the present invention, it is possible todetermine the cause of error.

Furthermore according to the present invention, it is possible to selectwhether or not to execute the ERP according to the cause of error.

Furthermore according to the present invention, it is possible, in thecase of error due to a particular cause, to make the response of thedisk storage device to the host system faster by reassigning withoutexecution of the ERP.

What is claimed is:
 1. A method of executing an error recovery procedure(ERP) including a plurality of error recovery steps in a storage device,comprising the steps of: calculating a discrepancy of a positioningerror; converging said discrepancy of a positioning error; determiningthat said converged discrepancy of the positioning error is greater thana predetermined reference value; and maximizing a number of times of anERP loop.
 2. The method of executing an error recovery procedureaccording to claim 1, wherein said step of calculating the discrepancyof the positioning error comprises the steps of: calculating a value ofan integral of the positioning error; and correcting said calculatedvalue of an integral of the positioning error by dividing it by a gain.3. The method of executing an error recovery procedure according toclaim 1, wherein said step of converging the discrepancy of thepositioning error comprises the step of waiting for a disk included insaid storage device to rotate a predetermined number of times.
 4. Themethod of executing an error recovery procedure according to claim 1,wherein said step of converging the discrepancy of the positioning errorcomprises the step of waiting for a disk included in said storage deviceto rotate for a predetermined time.
 5. A method of executing an errorrecovery procedure (ERP) including a plurality of error recovery stepsin a storage medium, comprising the steps of: calculating a discrepancyof a positioning error; converging said discrepancy of the positioningerror; determining that said converged discrepancy of the positioningerror is not greater than a predetermined reference value; andminimizing a number of times of an ERP loop.
 6. A device for executingan error recovery procedure (ERP) including a plurality of errorrecovery steps, comprising: means for calculating a discrepancy of apositioning error; means for converging said discrepancy of thepositioning error; means for determining that said converged discrepancyof the positioning error is greater than a predetermined referencevalue; and means for maximizing a number of times of an ERP loop.
 7. Adevice for executing an error recovery procedure (ERP) including aplurality of error recovery steps, comprising: means for calculating adiscrepancy of a positioning error; means for converging saiddiscrepancy of the positioning error; means for determining that saidconverged discrepancy of the positioning error is not greater than apredetermined reference value; and means for minimizing a number oftimes of an ERP loop.